190 research outputs found

    Correcting batch effects in single-cell RNA sequencing data by matching mutual nearest neighbours

    Get PDF
    The presence of batch effects is a well-known problem in experimental data analysis, and single- cell RNA sequencing (scRNA-seq) is no exception. Large-scale scRNA-seq projects that generate data from different laboratories and at different times are rife with batch effects that can fatally compromise integration and interpretation of the data. In such cases, computational batch correction is critical for eliminating uninteresting technical factors and obtaining valid biological conclusions. However, existing methods assume that the composition of cell populations are either known or the same across batches. Here, we present a new strategy for batch correction based on the detection of mutual nearest neighbours in the high-dimensional expression space. Our approach does not rely on pre-defined or equal population compositions across batches, only requiring that a subset of the population be shared between batches. We demonstrate the superiority of our approach over existing methods on a range of simulated and real scRNA-seq data sets. We also show how our method can be applied to integrate scRNA-seq data from two separate studies of early embryonic development

    Adjustments to the reference dataset design improve cell type label transfer

    Get PDF
    The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as reference, as well as the number of computational methods for cell type label transfer are constantly growing, rationals to understand and decide which reference design and which method to use for a particular query dataset are needed. Using detailed data visualisations and interpretable statistical assessments, we benchmark a set of popular cell type annotation methods, test their performance on different cell types and study the effects of the design of reference data (e.g., cell sampling criteria, inclusion of multiple datasets in one reference, gene set selection) on the reliability of predictions. Our results highlight the need for further improvements in label transfer methods, as well as preparation of high-quality pre-annotated reference data of adequate sampling from all cell types of interest, for more reliable annotation of new datasets

    Adjustments to the reference dataset design improves cell type label transfer

    Get PDF
    The transfer of cell type labels from prior annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which can be used as a reference, as well as the number of computational methods for cell type label transfer are constantly growing, rationals to understand and decide which reference design and which method to use for a particular query dataset is needed. Here, we benchmark a set of five popular cell type annotation methods, study the performance on different cell types and highlight the importance of the design of the reference data (number of cell samples for each cell type, inclusion of multiple datasets in one reference, gene set selection, etc.) for more reliable predictions

    Single-cell multi-omics and lineage tracing to dissect cell fate decision-making

    Get PDF
    The concept of cell fate relates to the future identity of a cell, and its daughters, which is obtained via cell differentiation and division. Understanding, predicting, and manipulating cell fate has been a long-sought goal of developmental and regenerative biology. Recent insights obtained from single-cell genomic and integrative lineage-tracing approaches have further aided to identify molecular features predictive of cell fate. In this perspective, we discuss these approaches with a focus on theoretical concepts and future directions of the field to dissect molecular mechanisms underlying cell fate

    An Explicit Framework for Interaction Nets

    Full text link
    Interaction nets are a graphical formalism inspired by Linear Logic proof-nets often used for studying higher order rewriting e.g. \Beta-reduction. Traditional presentations of interaction nets are based on graph theory and rely on elementary properties of graph theory. We give here a more explicit presentation based on notions borrowed from Girard's Geometry of Interaction: interaction nets are presented as partial permutations and a composition of nets, the gluing, is derived from the execution formula. We then define contexts and reduction as the context closure of rules. We prove strong confluence of the reduction within our framework and show how interaction nets can be viewed as the quotient of some generalized proof-nets

    Reconstructing Gene Regulatory Networks That Control Hematopoietic Commitment.

    Get PDF
    Hematopoietic stem cells (HSCs) reside at the apex of the hematopoietic hierarchy, possessing the ability to self-renew and differentiate toward all mature blood lineages. Along with more specialized progenitor cells, HSCs have an essential role in maintaining a healthy blood system. Incorrect regulation of cell fate decisions in stem/progenitor cells can lead to an imbalance of mature blood cell populations-a situation seen in diseases such as leukemia. Transcription factors, acting as part of complex regulatory networks, are known to play an important role in regulating hematopoietic cell fate decisions. Yet, discovering the interactions present in these networks remains a big challenge. Here, we discuss a computational method that uses single-cell gene expression data to reconstruct Boolean gene regulatory network models and show how this technique can be applied to enhance our understanding of transcriptional regulation in hematopoiesis.Work in the author’s laboratory is supported by grants from the Wellcome, Bloodwise, Cancer Research UK, NIH-NIDDK and core support grants by the Wellcome to the Cambridge Institute for Medical Research and Wellcome & MRC Cambridge Stem Cell Institute. F.K.H. is a recipient of a Medical Research Council PhD Studentship

    Towards reliable quantification of cell state velocities

    Get PDF
    A few years ago, it was proposed to use the simultaneous quantification of unspliced and spliced messenger RNA (mRNA) to add a temporal dimension to high-throughput snapshots of single cell RNA sequencing data. This concept can yield additional insight into the transcriptional dynamics of the biological systems under study. However, current methods for inferring cell state velocities from such data (known as RNA velocities) are afflicted by several theoretical and computational problems, hindering realistic and reliable velocity estimation. We discuss these issues and propose new solutions for addressing some of the current challenges in consistency of data processing, velocity inference and visualisation. We translate our computational conclusion in two velocity analysis tools: one detailed method κ-velo and one heuristic method eco-velo, each of which uses a different set of assumptions about the data

    Deep generative modeling for single-cell transcriptomics.

    Get PDF
    Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells ( https://github.com/YosefLab/scVI ). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task

    Reconstructing cell cycle and disease progression using deep learning

    Get PDF
    We show that deep convolutional neural networks combined with nonlinear dimension reduction enable reconstructing biological processes based on raw image data. We demonstrate this by reconstructing the cell cycle of Jurkat cells and disease progression in diabetic retinopathy. In further analysis of Jurkat cells, we detect and separate a subpopulation of dead cells in an unsupervised manner and, in classifying discrete cell cycle stages, we reach a sixfold reduction in error rate compared to a recent approach based on boosting on image features. In contrast to previous methods, deep learning based predictions are fast enough for on-the-fly analysis in an imaging flow cytometer

    Normalizing single-cell RNA sequencing data: challenges and opportunities

    Get PDF
    Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users
    • …
    corecore